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identify all cDNA speeies. and the approach does not casilv allou a svstemat.c 
screening. Analyse of gene expression by the study of protein* present in u ceil or 
tissue presents a favorable alternative. This can be achieved by use of two-dimensional 
i:-D) eel electrophoresis, qualitative computer image anilysis. and protein identifi- 
cation techniques to create 'reference maps' of all detectable proteins. Such reference 
maps establish patterns of normal and abnormal gene expression in the oreanism. and 
allow the examination of some post-translational protein modifications which are 
functionally important for many proteins. It is possible to screen proteins siemati- 
cally from reference maps to establish their identities. 

To define protein-based eene expression analysis, the concepi of the proteome' 
was recently proposed fWilkins et aL 1 995: Wasinger et al.. 1 995 ). A proteome i v the 
entire PROTem complement expressed by a genOME. or by a cell or tissue type The 
concept of the proteome has some differences from that of the genome, as while there 
is only one definitive genome of an organism, the proteome is an entitv which can 
change under different conditions, and can be dissimilar in different tissues of a single 
organism. A proteome nevertheless remains a direct product of a eenome Interest- 
ingly, the number of proteins in a proteome can exceed the number of genes present* 
• as protein products expressed by alternative gene splicing or with different post- 
translational modifications are observed as separate molecules on a 2-D sel As an 
extrapolation of the concept of the 'genome project", a -proteome project ' i* research 
which seeks to identify and characterise the proteins present in a cell or tissue and 
define their patterns of expression. 

Proteome projects present challenges of a similar magnitude to that of senome 
projects. Technically, the 2-D gel electrophoresis must be reproducible and of hieh 
resolution, allowing the separation and detection of the thousands of proteins in a cell. 
Low copy number proteins should be detectable. There should be computer eel imacc 
analysis systems that can qualitatively and quantitatively catalog the electrophoreticaHv 
separated proteins, to form reference maps. A range of rapid and reliable techniques 
must be available for the identification and characterisation of proteins. As a conse- * 
quence of a proteome project, protein databases must be assembled that contain 
reference information about proteins: such databases must be linked to ccnomic 
databases and protein reference maps. Databases should be widely accessible and easv 
to use. 

Recently, there have been many changes in the techniques and resource available 
for the analysis of proteomes. It is the aim of this chapter 10 discuss the status of the 
areas outlined above, and to review briefly the progress of some current proteome 
projects. 

Two-dimensional electrophoresis of proteomes 

Two dimensional ( 2-D) gel electrophoresis involves the separation of proteins by their 
isoelectric point in the first dimension, then separation according to molecular weicht 
by sodium dodecyl sulfate electrophoresis in the second dimension. Since first 
described (Klose, 1975: 0-Farrell. 1975:Scheele. 1975). it has become the method of 
choice for the separation of complex mixtures of proteins, albeit with manv modifica- 
tions to the original techniques. 2-D electrophoresis forms the basis of proteome 
projects through separating proteins by their size and charge (Hochstrasser et al.. 
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:-0 GEL RESOLUTION AND REPRODUCIBILITY 
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vital 10 allou comparison of gels from day i 0 day and hei«.?-n res.-arrh site. Th-s- 
factors can be difficult to achieve. 

Crrncr ampholyte? art a common means of isoelectric focusinc for the ft™ 
dimension of 2-D electrophoresis. Gels are usually focuseJ to equilibrium to separate 
proteins in the pi range 4 to 8. and run in a non-equilibrium mode (NEPHGEi to 
sepaiate proteins of higher pi (7 to Mil lOTarrell. 1975: OTanell. Gotximan and 
O'Farrelt. 1 977 1. Unfortunately, the use of carrier ampholvtes in the isoeiectrr 
focusing procedure is susceptible to -cathode drift*, whereby pH cradients established 
h^p •efocuMnj of amp holyte s slowly change with umejRicheiti and Drysdale 1 9" \ i 
Carr.er ampholyte pH gradients are also distorted by high sa.t concentration of 
samples i Bjellqvist*/ «/.. 1 982). and by high protein load tOTam.'!. 1975 1. a further 
limitation i< that iso electric focusing gels, which are cast and subject to electrophore- 
sis in narrow glass tubes, need to be extruded by mechanical means before application 
to the second dimension - a procedure that potentially distorts the gel. Nevertheless 
mam of the above shortcomings can be avoided by loading small amounts of "C or ' S 
radiolabeled samples (Garrels. 1989: Neidhardt etuL 1989: V'andekcrkhove ci til 
1990). High sensitivity detection is then achieved through use of fluoro«raphy or 
phosphonmagmg plates (Bonner and Laskey. 1974: Johnston. Pickett and Barker 
1990: Patierson and Latter. 1993). However, this approach is only practicable for 
organism* o: tissues that can be radiolabeled. 

An alternative technique, which is becoming the method of choice for the first 
dimension separation of proteins, involves isoelectric focusinc in immobilized pH 
gradient ( IPG i gels i Bjellqvist ei oL 1982: Gbrg. Postel and Gunther. 1 988: Rishetti 
1990). Immobilized pH gradients are formed by the covalent coupling of the pH 
gradient into an aerylamide matrix, creating a gradient that is completely stable with 
lime. IPG gels are usually poured onto a stiff backing film, which is mcchanicalK 
strong and provides easy gel handling (Ostergren. Eriksson and Biellqvist. 1 988 ) The 
major advantages of IPG separations are that they do not suffer'from caihodic drift. • 
■they allow focusing of basic and very acidic proteins to equilibrium. pH eradicnts can 
he precisely tailored (linear, siepwise. sigmoidali. and thai separations* over a vcr> 
narrow P H range arc possible (0.05 pH units per cm) (Riirhcui. I WO: BjcllqviM <•/ «// 
1 982. 1 993a: Sinha ct «/.. 1 990: Gorg ci «/.. 1 988: Gclfi ct «/.. I V87: Gunther *•/ al '. 
1988). Ho» ever, it not currently possible to use IPG geN to separate \er\ basic 
protein* of isoelectric point greater than 10. although this is under development. 
Narrou pH range separations are useful to address problems of protein co-niiuration 
in complex samples, allowing "zooming in' on regions of a gel iFivmv 2). IPG eel 
strips are now commercially available, which begin to address the problems of intra- 
and inter-lah isoelectric focusing reproducibility. 

There are two means of electrophoresis for the second dimension separation of 
proteins: vertical slab gels and horizontal ultrathm gels (Gorg. Postel. and Gunther. 
1988). Both are usually SDS-containing gradient gels 0 f approximate! \ I |7r u> 15 T, 
aery lamide. which separate proteins in the molecular mass range of 10 - 15()kD. A 
stacking gel is not usually used with slab gels, hut is nccessarv when usmc horizontal 
gel setups (Gbrg. Postel and Gunther. 1988 1. Comparisons have shown that there is 
little or no difference in the reproducibility of electrophoresis usinu either approach 
(Corbett et «/.. 1994a). but commercially available vertical or horizontal precast gels 
will provide greater reproducibility for occasional users. For slab eel electrophoresis 
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the use of pipenzine diacrylyl as a gel crosslinker and the addition of thiosulfate in the 
catalyst system has been shown to give better resolution and rusher sensitivity 
dejection (Hochstrasser and Merril. 1988: Hochstrasser. Patchomik and Merrii 
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Nomfotanding the advance described atrve. there is in increasing d-r.tand ,o 
improve the reproducibility of 2-D electrophoresis !D facilitate datable constru~t,on 
and proteome studies. Harrington ei al. ( !993i explain tliar if a K j resolves 4000 
protein spots, and there is 99.55- snot matching from gel in gel. ihU will produce "»0 
spot errors per gel. This amount of error, which might accumulate with each <»ei to -el ~ 
comparison used in database construction, could produce in unacceptable d-Ve-V 
uncertainty m gel databases. To address these issues, panicl automation of la£e ~-D 

gel separations has been undertaken (Nokihara. Morita and Kuriki. 1 99"- Hamneton r ' : 

"j^-- J9^ 

in one study was found to be threefold improved over manual methods < Harrington a 
al.. 1993i. It should be noted that small 2-D gel formats :50 x 43 mm) have been 
almost completely automated (Brewer et al.. 1986). atthoueh these are not -enerallv 
used for database studies. 

c. 

MICROPREPARAT1VE ;-D GEL ELECTROPHORESIS R " : 

With the advent of affordable protein microcharacierisation techniques including N 
terminal microsequenemg. amino acid analysis, peptide mass finserpnntin- phosphate 
analysis and monosaccharide compositional analysis, a new challenge for^-D electro 

phoresis has been to maintain high resolution and reproducibility but" to provide" r,,,: 
protein in sufficient quantities for chemical analysis (high nanogram to low micro-ram 
quantmes of proteins per spot .. This becomes difficult to achieve u ,th verv complex 
samples such as whole bacterial cells, as the initial protein load is divided arnon- "'OOO 

to 4000 protein species. Two approaches are used for producing amounts of mat'eriil 2 " k 
that can be chemically characterised. The first method is to run multiple gels, collect 
and pool the spots of interest, and .subject them to concentration (Ji a al 1994 Walsh 
e, al.. 1 995 : Rasmus sen et al.. 1 992 ). In this approach, the concentration process must 
also act as a punficat.on step to remove accumulated electrophorctic contaminants ' 
such as glycine. A more eleeant approach has been to exploit the high loadinn capacity 
of IPG isoelectric focusing. The high loading capacity of immobilised pH^adients 
was described early .Ek. Bjellqv.si and R.ghetti. 1983). but has only recently been 
applied to 2-D electrophores.stHanashf/t//.. 1991 : Bicllqvist <•/„/.. | 99 , b) Up , p ,^ 
mg of protein can been appl.ed to a single gel. yielding microgram quantities of hun- 
dreds oi protein spec.es. A further benefit of this approach is that proteins present in 
low abundance, which may not be visualised by lower protein loads, are more likely 
to be detected. The use of electrophoretic or chromatographic prefrac.onat.on tech", 
mques < Hochstrasser e t aL 1 99 1 a: Harrington et al.. 1 992 >. followed bv hi»h loadm- 
of narrou -range IPG separations (Bjellqvistr,,,/.. ] 993b. provides a likelv 'solution to 
siudies on proteins present in lou abundance. 

PV| 

Methods of protein detection n 

There are many means for detecting proteins from 2-D gels. The method used will be 
d.ctated by factors including protein load on gel (analytical or preparative) the 
purpose of the gel ( for protein quantitation or for blotting and chemical characterisa- 
tion ). and the sensitivity required. The most common means of protein detection and 
their applications are shown in Table 1. Most detection methods have drawbacks for 
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example, some glycoproteins are not stained by coomassie blue (Goldberg a al 
1 988 ). and many organic dyes are unsuitable for protein detection on PVDF iflamples 
are to be used for direct matrix-assited laser desorption ionization mass spectrometry 
(Sirupat ctai. 1994). 7 

Although most means of protein detection give some indication of the quantities of 
protein present, in general they cannot be used for global quantitation. This is because 
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no proieii. stain is able con.-.istemly to detect proteins over a u He ranee of con-ntr 
lions, i*oelectric points and amino acid compositions, and with a vjh-iv of 
poM.trariiauonal modifications (Goldberg eiai. 19gg: Li etaL I9g9i. Furthermore 
there are large differences m staining pattern when identicaJ eel* or blo.s are subverted 
m differ:ni stains, including amido black, imidazole zinc, india ink. ponceau S 
colloidal gold, or coomassw blue iTovey. Ford and Baldo. I9R7: Ortiz a al.. ]wzt 
llie nu». i common means of quantising large numbers of protein, in a "-D *-l 
involves the radiolabell.ng of protein sample, prior to electrophones, and'nroic.n 
quanrta.ion based on fluorography and image analysis or liquid scintillation crumm- 
• Carrel* 1989: Cel.s and O.sen. 1994,. However, proteins which do not con urn 
meihion..ie cannot be detected if only pS) methionine is used for laKeUm- Armm , 
acid analysis of protein spots visualised by other techniques presents a likelvmeans of 
protein quantitation for the future. 



BLOTTING OF PROTEINS TO MEMBRANES 



Electrophoretic blotting of proteins from two-dimensional polvacrvlamide -cK to 
membranes presents many options forprotein identification and microcharucicrlsaiion 
which are not possible when proteins remain in gels. For example, when proteins are 
blotted to polyvmylidene difluoride (PVDFi membranes, they can be identified bv N- 
terminal sequencing, amino acid analysis, or immunoblotiing.or thev mav be subjected 
to endoprotemase digestion, monosaccharide analysis, phosphate "anafvsis. or'dircci 
matrix-assisted laser desorption ionisation mass spectrometrv (Matsudiira 1987 
Wilkins<v„/.. 1 995: Jungblut eta!.. 1994: Sutton ««/.. 1995: Rasmussenc/«/ 1994* 
Weizthandler ei al.. 1993; Murthy and lqbal. 1991: Eckerskorn a al.. 199^ i, is * 
possible to combine of some of these procedures on a sincle protein spot on a~PVDF 
membrane < Packer a al.. 1 995: Wilkins e i al.. submitted: Weizthandler a al 1 993 , 
This is useful when minimal amounts of protein are available for anaKsis These 
if chniques w.ll be explored in detail later in this review. Noiwnhstandin- the above 
there are some disadvantages associated with blouins of proteins to membranes' 
There always loss of sample during blotting procedures < Eckerskorn and Lotts PC ,eh 
19931. and common protein detection methods are less s-ns.me „ r nol , ipp]lL - M c to 
membranes iTablc h. presenting difficulties for the anaKsis of lou ahunditicc 
proteins Detailed d.scuss.on of the men.s of available membranes and common 
blotting techniques can be found elsewhere (Eckerskorn and Lotispeich 199V Smm.i 
cial.. 19«4; Patterson. 1994). -.^ruruii 

2-D eel analysis, documentation, and proieome databases 

Following protein electrophoresis and detection, detailed analvsis of «-el images is 
undertaken with computer systems. For proieome projects, the aim of this analvsis is 
to catalogue all spots from the 2-D gel in a qualitative and if possible quantitative 
manner, so as to define the number of proteins present and their levels of expression 
Reference gel images, constructed from one or more gels, form the basis «f ,woi 
d.mensional gel databases. These databases also contain protein spot idcniities and 



Progress with prmctme projects 
d.-taiK of their po .-transiationa! modifications. :-D *el daiafcu. 
linked to or integrated u,, P compr-h-nsive JZl , m ** * innin ? » i* 

database*, containing DN'A s?quence data. rhmmrL and or ."nisnV 

J. pmm. and pro, e =m e p re .; Kl , prope,. (S^^TH''^ 
Database cited in Garrett e: ol.. I 994 °°? c, =n « «/•• 1 99.. ^ east Proie.n 



GEL IMAGE ANALYSIS AND REFERENCE GELS 

After :-D electrophoresis and protein visualisation bv «:iimn„ n 
phosphorimaging. tmages of gels are digitised fc r^JJS » 
scanner, laser densitomer. or charge-coupled device iCC^lZJr X * ' ma?e 
Cefe « «/.. 1990a: Uruin and Jackson 1993, a. CCD) ejm , era (C ™lv 1989: 
solution of i00-200mm. ^c^^^^^ ^ a 
or more grey scales' ,. Following this. 8 el ij£ ^ su l c ^ T " ^ ,:56 
pulations ,o remove vertical and honzontal ° ° f ""^ 

spot pos.uons and boundaries, and to ^cul^^i^^F** ^ 10 
spot fSSP. number, contaimn* venical and hnZl ? ' ( ^''"' -<>• A standard 
signed to each detected spot and be o^Vo^ T^' inU >™"™- » 
l.s,s some notable sofiu-a^ackage^ |^^, f !=!L~ ™" " 



Tahie 2: Snmr S«.fiwarc Packag es f.,r the Analysts of Gel Images 
Gel imj«: Anal>si< System Reference*' 
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CALCTLATION Or PROTSIN ISCILECTRh ' POINT AND M015CI/HR WEIGHT 

Estimation of the isoelectric pomi ipl: and modular weicht (MW, 0 f prot-,ns from 
:-D cel. p:ovidss fundamental parang for each protein, which are also of use 
dunnc identification procedures see f jlloums section). The pi and MW of protein* 
art recorded in :-D eel databa-es. Accurate estimations of protein pi and MW can be 
obtained by using 20 or more known pr Jteins on a reference map to construct standard 
curves of pi and molecular weight, which are then used to calculate csumat-d nl and 
MW of unknown proteins (Neidhardt ei aL I9S9: Garrels and Franza. 19S9 Y ,„ 
Bogelen. Hutton and Neidhardt. 19%; Anderson and Anderson. 1991 Anderson ci 
at 1991: Latham e,ul.. 1992.. Alierrmively. the MW of individual protein, blotted 
to P\ DFcan oe determined very accurately by direct mass snectromctn (Eckcrskurn 
ti aL 1992i. Where immobilised pH gradients are used, the focusing position of 
protein* allow, their pi to be measured within 0.15 units of that calculated from the 
ammo acid sequence (B.iellqvist 1 993c ■». It must be noted, however thai proteins 
earning post-iranslauonal modifications may migrate to unexpected pi or MW 
positions during electrophoresis (Packer ««/.. 1995). 



SPOT Qf ANTIT A7I0N AND EXPRESSION ANALYSIS 

A major challenge faced in proteomc projects is the quantitative analvsi* of proteins 
separated b> 2-D electrophoresis. The most accurate means of protein quantitation .s 
to determine chemically the amount of each protein present bv amino acid com- 
positional analysis. However, the current method of choice for quantitative analvsis 
of many proteins is to radiolabel samples with ("S] methionine or "C amino acids 
perform the 2-D electrophoresis, and measure protein levels in disintegration* per 
minute .dpm. or units of optical density. Quantitation is achieved cither hv liquid 
scimillai.on counting, or h> gel image analysis uhcre spot densities arc quantified 
b> reference to gel calibration strips containing known amounts of radiohhclled 
protein or against the integrated optical density of all spots visualised , Yandekerkhove 
ci aL 1990: Celis ct aL 1990b: Celis and Olsen. 1994: Carrels. 1989 Lath im 
GarreN and Solter. 1993: Fey c aL 1994,. All approaches cffcct.velv allow spots ,o 
ne normalised against the total disintegration* per minute loaded onio th- n C , 
Limitations that remain w.th rad.olabelling methods are that absolute quantitation is 
not achieved because all proteins have varying amounts of anv ammo acid and th n 
only easily labelled samples can be investigated. Quantitative silver stain,,,.' presents 
un alternative (G.omett, et aL 1991: Harrington n aL 1992. Rodn-uez rial 199V 
Mynck c aL 1993). wh.ch when undertaken w„h ["Sjth.ourca (Wallace and SaluV 
I j.hi iv of extremely high sensitivity. 

When protein spots from samples prepared under different conditions arc quantit-ited 
and matched from gel to gel. it becomes possible to examine chanees and patients , n 
protein expression. Large scale investigation of up- and down-reculauon of proteins 
their appearance and disappearance, can be undertaken. For example, simian virus 40 
transformed human keratinocytes were shown to have 177 up-recuiated and 58 down- 
regulated proteins compared to normal keratinocytes (Celis and Olsen 1 994 1 detailed 
synthesis profiles of 1 200 proteins have been established in I to4 cell mouse embrvos 
• Latham «„/.. 1991. 1992): and 4 proteins out of 1971 were found to be markers"f or 
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cadmium lOMcity in unnan protems rMyrirt tr al. \99h Cmkx P J«r ... u 

anj P. Mov .Lar> en . Phonal communicate Impressivelv. laree «, 
pro.cn express under different cond.uon, can be globalh mve^eS 
,ta«_s,,ca n ethod. that find sroupc of related object, utthtn a iet. For example he 
REFS2 ra, c,ll hne database. con*,«,n S of 79 $ els from 12 experimental emunT u he re 
each «« contamsouamnauve data for loOOcro^rnatched proTear... lu^cTn L v^d 
h> cliwer analyse fGarreK „ «/.. 1990). Thi.< revealed clusters of nrcne.n r 
example, v ere induced or repressed Mmilariv und er vmL Jin Z^ ] * 
rran.fprrnx.on: <u gf e.t.ne a eomrrTon mechanism. ^^^^7'^ 
or repressed Junnc culture crou th to confluence were alto f!Z* l w du " d 

immtn*. 1, ,s equally clear ito, in«Mipuion S of gene ewe'suon o/£T ? 




Hunun heart djtaravc* 



Hunun Kcrjunti^vir database 



\li-u-.- -.*mrr\ t . Ujuma^c 



Mmum; Incr database 
* -\fj:imnc Protein 
M-ippirii: Groupi 
R.n i»\rr epithelial dai.ina\e 

Rj: lt\cr Jauhast 



REF 52 rji tell line database 



5\vjsS-:DPACE amiaimnc 
human rcterencc map* 



Yy:i*i Prmem Daiahase t VPD> 
and Vcjm Electrophoreiic 
Pnuem Database (VEPD» 



Gci spot* linked u«h GcnBank 
and Knhara clones; quantitative 
snot measurements under differ- 
cm ernuth conditions 

Identification m' disease markers 
tun separate daiaha\e* have 
been established 

Extensive identifications; 
uuantitativc spot measurements 
of transformed cells: identifica- 
tion nl disease markers 
Quantitative spm 
tnravuremeniv through 
l in - cell Maye 

Documents chances due to 
exposure m mm/iny radiation 
and u»\u chemicals 
Detailed subcellular 
I raciiun.ii inn Mudtet 

Extensive studies on regulation 
"I protein* h> drucs and i»uc 
acenix 

Accessible via W« r |d Wide W C b 
uuantitaitve spot measurements 
under difterent conditions 

Accessible via W«rid Wide W c h 
completeK mtecrated u ith 
SWISS- PROT and 
SWISS.JDIMAGE 

CompicteK crossreierenccd 
oreamsm database. YPD has 
extensive inlormation on over 

proteins. VEPD has 
man> identifications 



Baker ctaL. IVV2 
Cnrnctt a «/.. jggjb 
Junphlut n#//.. I9W4 

Cellar/ «/.. iwviuj 

Cell* ( / al.. 

Cell* and OUcn ivgj 

Laih.im r/ ///.. |g^I 
Laituni t't til.. \vo2 

G.nm.Mn. T.i>inr and Tnll.,kx C n. \W2 



And.TMin and Ander^m. iwwi, 
Anjcr%nn r/ ui. I^ u 2. 
Richjrd^.n H.»rn .ind Andi'r«m. jugj 

Garreu ;md Pran/a iv.HV 
Bout;'lj r/ /;/.. |ggj 

Appel rial.. \w\ 
Hi »cn*t raiser n «/.. jgu^ 
HurltL-- r/ |gv.^ 
Gt»ij/ r/ «/.. |gv3 

GarrcU rt til., ivgj 



iniv*::;.. 
Z-D 

ol reitf 
NhouIJ 
Macini 
Uie arc 
anncna* 
^cqucn 
One 

swis> 

1904; . 
feature 
ZDPAi 



Table 4 

All thrc 
cxnav\.l 



Inform:' 



Anmua 



Rcfcrr 
DataK. 



Other 



FEATURES OF PROTEOME DATABASES 



Progress ii ,lh />,„„„„„. />,. 



'let /» 



Proteome projects rely h«miv on compuier datable to More information about a!l 
protein, expressed by an organism. -Proteome database*' should contain d"w 
information of proi:m> already characterised elsewhere, a, udl a< protein datii from 
2-D gel, Mich » apparent pi and MW. expression level under different condition*, 
subcellular localisation. anC information on poM-translational modification* lm i-c 

°l 'Ia 'T' \ l \ Pr ° lCin SSP nUmbcr ' P r0lein 'demificanc^. 

should also be included. Ideally, proteome databases should he accessible uuh 
Macintosh or IBM persona computers and easy 10 use. Some proteome dauha^ and 
the areas ,hey cover arc lasted in Table J. Databases range from collects of 
annotated gel; to large databases of images integrated u-ith protein and nucleic acid 
sequence banks. 

One example of an integrated proteome database is the suite of SWISS-PRHT 

994; Appel. Bairoch and Hochstrasser. 1994; Bairoch and Boeckmann. ZlXc 
features of these three databases are listed in Tabic V. SWISS-PROT cvvi« 
2DPAGE and SWISS-3DIMAGE are accessible through the World Wide Web 



Table 4: The 5U1SS-PROT. SUTSS-IDPAGE and SUISSODIMACE suite ^ M ^ w 
All inree uatanatet are a.xe<sible inrnugh the World W.de W C b. at URL addrew httr// 



SWISS-PROT 



SWISS2DPAGE 



SWISS-3D1MAGE 



Information Ten entries of sequence data: 
Citation information: 
taxonnmu; data. 3b. W 
entries in Release 2V 



Annotation* Protein function. 

Po*t translations I 
modifications. 
Domains. 

Secondary <tru*iure. 

Quaternar\ «iru«ture. 

Di*ra*cv avMuutcd 

u ith protein . 

Seouence conflict* 
Cre«*. SWISS-ZDPAGE 
Referenced SWISS- JDIMAGE 
Daiana*es EMBL. PIR. PDB. 

OMIM. PROSITE. 

Medlme. Flyha^e: 

GCRDb. MaueDB. 

WonnPep. DicnDB 
Other Features Navigation in other 

SWISS database* achieved 
h> seicrtinc mine* with 
computer mouse 



2-D pel tmaces of; human 
liver, plasma. HepGi HcpC2 
secreted proteins, red Wood cell, 
lymphoma, ccrehro^pmal lluid. 
macrophaee like cell line. 
crvihrolcuLemia cell, phuclci 
Gel imasrv where 
protein lound. 
Hou protein identified. 
Protein pi and MW. 
protein numher: 
normal and pathnli.^ual 
variant* 



SWISS-PROT and all 
other databases 
awvc*%ihlc throuch 
SWISS-PROT " 



Gel imaces tho* position 
of identified proteins, or 
rcifion of eel where protein 
should appear 



Collection of * Vi vd 
tmayc* ot proteins 



All nnnni:iitim is 
.iviuLihL- in SW'LSS. 
PROT 



SWISS-PROT and all 
other database* 
accc^ihlc thr.iui;li 
SWISS-PROT ' 



Mono and Mereo 
imace* available. 
Imace* can he 
trantierrcd to Uwal 
tomputcr imupe 
viewing nrnrraim 
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( Bemers-Lee e i al. 1 992 ). aJlouine anv com* 

the stored mformauon and ,m a . Ks . y^ZZ l^ZT^ '° "* ,mcm " ,0 
i* ^amirss. a s all poIcnilaj crossIinks ^ *! «* '"re* database 

car he selected «ith a computer mnu« From rh-s* rf « t^"™ 00 lhf ***** ™* 
abc ut a prote.n. .nclud.ns am.no acid sequence and I«7 * Ui,ed ,nfo ™"»on 
canons, can be obtained, the prec.se protein snotT P 0 * , - ,fa »*bwi a | modifi. 
«u „ can be v,ewed if k „ 0 uu and KS Cure ^ °" U Mfa "« *' 
mailable. Reference. ,o nucie.c acid and «£SL? C ' M be ^ n * 

ncress to informauon stored elsewhere. arff j: < p - civ «> » prov.de 

Organism* databases, containing (l«->ii^ _ 

These differ f«m nucleic «id orcein «qJS^? P """i* prf * ccl * TO*^. 
PROT because ,hey are image LT^^^^ 0 ^^'^ 
map positions, transcription of senes ^^ mu r abou ' ^omo,omaJ 
«/irnc*i« cWi gene-prote.n taCT^ES S^" 00 *- 
VanBogeien and Ne.dhardt. IwVv^ST" ""T NeidhartL '"0: 
BC02DBASE is one example. |, comaTn!^"^ kno « n * 

information (including pi and MW estimates 'and <™, r : * D - cel *P« 

mation (GenBank or EMBL codes chroSS^I ^ ,tolficalion '- ««cric into. 

• Kohar, Akiyama. and ,s 0no . T^^^^^ * **» clones 

regulatory mformauon rievel of pro,', nt^^^Tr^ ^ aBd 

member of regulon or sumulonK All enm* TS^r^ 1 

referenced to tbe SWISS-PROT databTs"(Bairoch L d ** ^ 

anticipated that organism databases will <oon hJ*!T Boeckma nn- 19*»i. 1, i, 

available mformauon about a 

consistent manner m u h.ch onanism daub^f " ,h " e is currcn "v no U! * «" 

comparisons ,n the future ~ **** ™ nKttMeA - **ch ma> hamper "»» 



ihecc 



mad 
const 
icchi 
idem 



Identification and characterisation of proteins from 2-D -els ° f * 

The number of proteins identified on a 2-D reference man " 
a research and reference tool. As most referent 1 ? dcierm,ne * »" "^fulness as 
P™msidentif,ed,ma. I oraimof^^ 

•mm 2-D maps. ,n order ,o define ih-n. as InnuT ° * 10 ^ m:m > P™** 
duties, or as unknou „ Protcm"^^ 
open reading frames.. , nd providfs f \ ^ n ^ «" L-«mfimu,«,on of DNA PRO 
characterisation effons bv po.nt.nn to prole n^hl ^""'"- 2 P " liccts und P ro1 ^" 

•'000-4000 proteins f rom a s,n«ie <D manThat ,1 < S ' mee « hfrf ™* »* ' ^ 

Pnnem screenin, is l0 id.ntif^ e^n u ^ T challenge in ^ 

Traditionally. pr0lein , from te« h" • T" 111 " 71 ° f C ° M and cff "« Th ' S 

•mmunoblouin,. N.,erm,na» m.cS eq u ncin" ' n," * ^ h « ^ 

com, g ra,ion of unknown protems Z k„ ^ * PCpi,de ^uencine. ^ 

"omolosouscenesofint-res, ? Pr ° lCmS - 0r h > overexpre^ion of radi « 
1992: V»Bo£^ .^^oscnf^ = 

« «/.. 1994) Whilsi >h^ c . " : Hl)n ore «/.. 199V G .rrek chro 



T»We 5: Hitrarrhiral analvtit far nus« scrttninr nf * n 
Ra P .d and ,ne,-rr,„ e ,e: ;m«„e< are uvd „ a ^ J^^Tr* - 



Or J*: 



Amin.. aud a»ul. v* mm N-icrniinal sequence ia; 
Pepnde-maQv n» ? erpnn!ine 



Cnmhinanon ol amtnn and anaKsn and pepude 

ma>* fingerprinting 

Ma<* $rc^tromeir\ tequen;;c lag 

Exiens.vc N-tcrminai £dman murrtweouencmr. 

Jniernai pepude Edman m;rriKcw;ucncinr 

Microvfou-nitn^ h> ma.« spccinimem (electro- 
vpra> itinisaunn. posi-sourcc deca> MALDMOF* 



Ladder teuuen fc -inj 



Junpluif/,,/.. |mu:.s, uu> |yu;. 

H«hrthm.H..uih^%can.;.S; l iui t -r 
JunfWuir/c/.. l^. \v (iilft% ri \ ^ 

Walking r/ «/.. * U hmiu C d 

H«i«l « «/.. IWV P:, rrm . H <nJ 
Bica.h > .ivw.Vj aniCw7(// |wgl wr ^ J 
Mann. Hmrup and R.< rM „ rlI |ut „ 
>aie*r/,,/. i«w Mltn/ f . f ## ; 
.Suiitm r; «/.. |vy< 

Cnrduell r/n/., 1905. 
tVawnger rr «/.. jvy.v 

Mann and Wilm. ivvj 

KlaiMubira. IVK7 

Rmcnfeld rt aL W2: 
Hcllman r/ aL IW5. 

Johnson and U'jUh. IV«C 
Banlci-Jime^ r/ (ggj 



use of rap,d and cheap identification tools such as^i„o^? lMn T lvcslhe 
mass fingerprinting „ fi rs , Sleps in pr0lein idemS^^^ W"* 
dower, more expensive and time consuming identification nrn!. . r ' C u * e of 
.he construction of this hierarchy the analvs.Vt^e co" 1" ZZ < , [ ne *^. In 
of the data created has been considered a St ?^^ 0 ^ 
machine t.me P er sample, the ana.vsis o^dTclnTo^^ ^ ,i,,,C 
con.am.nc. Am.no ac,danah si, and peptide ma t-f^r ;md ,in * 

.«hniq US , in the hierarchy are discus^n^ b^T^ h " ed / dtf » ,ir '<^™ 
■dentification techn,^ ,n M , , see ^X^T^ ^ ^ 

PROTEIN IDENTIFICATION BY AMINO ACID COMPOSITION 
There has been a revival of interest in ih^ «r 

Th,. technique u*i a prowin\ idiJvn,-,, ., ' ^ ' E'kt r,k t ,m „ „/. , , m , 

Tne amino acid composition of orotic k. i prweins in databases. 

rad,o,a bsI „„ s md 

<>l.. J99-I; Frev ei al 1994) or hv ,,iM u V i / e,e ^rophoresiv tGarrcK £ -/ 
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Smm» v*"*L*-«»w 
«i»tit:::tic: 



Ktr. : '.2 . 2 Six: IC.4 Ser: S.7 Mix: C 7 

5)y: £.4 Thr: j.B AU: 6.7 p ro: 

Tyr: 1.3 >-;: 5.0 V»l: E.O m«- : C .* 3 

Il«: 1.9 Leu: B.O Ph«: 13.3 ty« : «.« 

P- «*tia»c«: 6.89 Ranpe a«arrfi«d: 4 6.64. 7 J4 , 
Hw «S^i=*Z«: i£B:C lUngt iur:at«: (13440. 20140! 

Clr.wt sw: £=-??." «-" ie , lor zH, r=«.: a. s=h «< »> aa 

K»nk Scer« Prsrein pi Mw »««=ription 

1 « ;r::-;::r: :■" =11:1 7V ^^^=^ actives* 

5 43 K.r..rr=-: s.sb 19769 hbsolysxu c. kaskic 
Closest Sv:iSS-?Rrr er.rries lor K5L- virfc e r ^ .... 

2 1S2 -^=_=™I 6.73 17921 TKAJ PROTEIN. 

3 112 YAJ5_S=. 1 6.79 19021 HYPOTHETICAL LIPOPROTEIN YAJG 

< 140 Yrj 2 _rr=L: 6.E3 14945 hypothetical 14.9 u » ««. 

3 1« YAHAErOLI 7.06 14726 HYPOTHETICAL PKOTeS £££ 3^*2x0* 
Ficure 4. C«mi*uicr r r.n.„ ul «r.,m ExPASv server where the cmprual am.nn acid c.,np,K„,.,n 

h.v ? «a.l> .n.r.-.vcj ,hc s,,,rc d.I.eren.c he.»een «he f,r<« and <e„.nd ran^n* pm.cn* T^torc 

grjphx -bused analx m v Protein, hloued 10 PVDF membranex can he hvdrolvscd in I h 
ai !^-C. ammo acid^ exiracied m ;« single brief siep. and each sample automatical I v 
dcnyat.sed and separated bv chromaiosraphy in under 40 minutes (Wilkins t -t af 
1 V9i : Ou r/ «/.. ] 995 1. In this manner, one operator can routinely analvse 1 00 proteins 
per week on one HPLC unit. This technology lends itself l0 automation and 11 is 
anticipated that instruments with even greater sample throughput will be developed 
W hen proteins have been prepared by micropreparauve 2-D electrophoresis , Hanash 
<v «/.. 1991 : BiellqviM « «/.. 1993b). blotted to a PVDF membrane and stained w.th 
amido black, any visible protein spot is of sufficient quantnv lor amino acid analysis 
« Corduell ei «/.. 1 995: Wasincer ci al.. 1 995: Wilkins e 1 «/.. 1 995 j. 

After the ammo acid composition of a protein has been determined, computer 
programs are used to match it against the calculated compositions of proteins , n 
databases (Eckerskorn cral.. 1988: Sibbald. Sommerfeldi and Ar»os. 1991- Jun«blut 
ft «/.. 1992: Shaw. 1993: Hobohm. Houthaeve and Sander. 1994; Wilkins « „/ 
I99m. Matching is usually done with only 15 or 16 amino acids, as cysteine and 
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Ftpure ... A P\ DF pro.cn <poi (mm an £ , „/, ;. D rclcr^-c nw P « a » , cuucn Xil u „ , . K . , 

same sample mm <urM Ci -, .,. 3 m,n«> and anah ms. The N-.cnn.nal science M L w R vv? V d K 
a. .d ..,in r ,K„,«,n <.i thr spm. a. well a» estimated pi and M\V » eft mal . h „■ ,, hcn lhc amin " 

PROT „.,£ „./, ,hcnh.nc h«of hes, matche, ^^^^^^ * ™*V n S * ™' 

tofe w.« d...erc„; t he.uccn Ac to, and SC c»nd i»k« 1^^^^^^ 7 ^ " 
.he L-rre,-, pmtc,,, uJcnt.fu-a.mn H.mcvcr. ihe *cuucn« i« e M L K £f I,- . 7" K " ,n 5 : 

in-pioptan are decoyed dunnp hydrolysis, asparasinc and ch.taminc arc dcamidalcd 
uuhe:r corresponds acids, and proline is not quantitatcd in w ana.vsis , 
The computer programs produce a l,s. of best matching proteins, which arc ranked bv 
a <core that indicates the match qualm. Some programs aliou „u,,ch,n" l( hc 
;;g" C ^n° ^ ? C ;™ d ° W ) ° f MW and P" 'Hohohm. Hnuthaevc and Sander 
^ w7l ?',uo- ,0 T P :° le,n emr '^ f ° r 0nc ^ «J»n,h. u ' 

^ L Z i r r" ' 9> '■ ThC USC ° f SUCh rCMr,C,ion ^ pouxr of 

nuichmp An example of protcn identif.cation h> amino acid composition ,s shown 

^"' r 4 To d f !• am,no composition has been used to .dennf V proteins from 
reference map, of Sp,ro P , t ,snu, n, ei lifcru„, Mycoplasnu, «„u,u,nn„. hLnl'Z 
r»»nce> cvrcv.siac. Dietyoxiehum dtxtnuicum. human sera, human hcan human 
lymphocyte, and mouse brain (Corduell craL MS: Wasm.er «T 19V " w ^ 1 



PROTEIN IDENT1 FICATION BV AMINO ACID COMPOSITION AND N-TERMINAL 

SEQUENCE TAG 



When samples from 2-D eels are not unambiguously identified bv 



y amino acid 
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c »mposiiion. pi arvl \I\Y. 0 f. f n th- corr— t . i -r 

,,kin * <• W.. Takinc advanu- 0 f thi oK ^C^" «»",,/.. m< . 

»-med Edman dccradauon and amino acd anaiwi ,11?" * ^ *' ciw- 
« Wilkin. „«,/.. submitted,. Thi, ,nvohe< the ^1 T "' rr °"' n ^"^uon 
rro.cn. by Edman detrradation for - or 4 cvd, ! "* UB,,nn * o1 "^'DF- blot ted 
which the same sampJc u ,cd for ammo acid anah""!' " T* UCnc? ,P,K,W "« 
■--moved from the protc.n. „, com P o„tion is no, ^nift^ am,nf ^>arr 
._ .. .mce ^^ma'Lamou .'h.rcd. Furthermore, 

i Jman de F ada,.on cycle, can be „ J """'"" ^^'cW 

allow 3 c> cle. .o he completed ,n I h. iherSv l"™' ^ 

prweiiK P .r u-eek on one automated. mulii-L„^ ^"'^ ° f ' "° or »"' r < 
;! uon. p, and MW of pro.cn, are JZ^^S^a^T^ con^ 
V.rrrmnal sequence, of he,, ma.chinc proteCL iS^I ^ «• «nd 

«o confirm the protc.n ,den«„v , F.v^c V^ZZ *"" h ,hf ^ uen « *c* 

protein, are N-.erm.nalh blocked', but a, onlv a f^T ^ * ,<S * Usefu ' 
Mi,cep.ihle to the acetyl, formvj. or pvro-lu.amvll?rr ""'"^ am,np *^ arc 
th- may „,e,f provide u,eful information" fotZulnc^T £" h, »^e. 
oi N-.erm.nal sequence u F and am,no acd comnothLV ' ,ttenufical ™- * trench 
data ,cner, t ed are uu.ckh and easih .nterp^eT ° Pr0 * ,n idcm,fi ™'™ - «ha. 

PHOTEIX IDEVTinCATlOV BV PEPTIDE MASS ns-CER PRI ST,NC 
Technique, for the identification of proteins bv n«»,w 

recently been de,cribed (Henzel a a, o 9 v u fin *«P*«inp have 

James,,,,,.. l^VNlann.HotrxipandRoepMoVfMQ^^ B,ca *>- ,y9 * 

<"-. l^ 4 -: Simon a til.. Wi^^;^^^"" 1 - W:Monz,, 
-«n ? rcMducpeafic enzyme.,, the detem ina^ 0 r l *™ ^ P"»™> ' 

■nc of these ma„e, acaiBH iteorei^wwr 2^'* '""^ a " J lhe ,na "*- 
sequence database, A, nrotcn, have Si feren am.no V -""^^ from nr«ei„ 
>hould pr„du,e charac.cn,,,, f.nserpr.nt, ^ * qucnces - ,h « r Wide 

d.peM, arc reported ,o produce mor- enzvm" ' f " ML ' d "' W "- jllh »»^'" ^-d 
c-^-en, peptJma,, ^r^^^ ^ ^ 

n : od,r.cdseq U enc,n, ? rade,b u ,otherenzv^ " 
al- been u,ed ,P app ,„. Ho.rup and l^Z ^V^^^ 

prpt.de, obtamed. it i, de,.rahle for nm,^ > nm,rr "« the number of 

honJ, of lhe rr01;in Jrc h • ' W>- Th>. e „s Utt , , ha , all di 

bromide , m c,h,on,„ c ^ifie^finST^ '"?? 0,,> ^ * O™*. 



'"•sto.. „„/, ,.,.„„,„„. /lr „ )ivi( 

iSikoderr, and Fresco. 1979; Crimmm<«„/_ 1990- Vmfl,,-^. , 

After m are d ,„ 51ed . ^ ' " : ' 

D.rec, anal>.i. of pep, -de m , vlure> „ „ xhtt *'™£* ^ ' ?Ww«v. 

ii< hir her <en-imhv anJ rreaier tolerant ■« ™.. IWwnNe hreause 0 I 

, James „ „/.. 1 99V Ml .£ " / ™ * «"» — <"» «*«-». from ;. D cel. 

% crm and Mann. ,»,.: v.™. RoepMorff and Nto^T^ ^ " "'~ '* U 
ma« <pt.-iron.etr>- alio..-, a .null fracion of adi™ , J aVu. 5 «n sl m u> of 
.or analvs,s. and anal,™ „«|f corapi5le ,„ , |,r0,tm T« » be used 

A major chaliene* associated with n*ntifV r_ . . 
Prior ,o computer matchtng ^^rS^"^ 1 '- 
mu<, be examined carefully ,o de.crmme which S r ? D J ^L**!**- S ^ n 
-mere,, a. there are often au.od.jesuon^n "d c2Z? ° f 

<tancev present .Henzel 1993: Moriz ct «/ igoj p ConUm,nj "'n? miK 

Funhermore. if pro,™ alkytaiion and rcduciion 'hL It " aL ' 9y4 ' 

pro.e»n digestion, peptide seouence coverage k « ""denaken pri ur to 

present ^^ffl^Sl* » « -h some 
- Monz c, „ L 1 904 , For eukarvotes. a ^ZuZ^TT ^™ in ,hf P™*" 

unmod.ned pepude alone can be v e rv difficult to detennini t , ° f ,he 
cation, introduced bv electrophoresis an ^S^I T u *° anifac »wl modifi. 

oxidaiion of methionine, are -to^^nS? " * and the 

He««„/.. I99^i °">ou ntoulierpepi.de mantle M aire r/,,/.. , oov 
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A number of computer programs are available for matchin- D-mid- », 
abases .reviewed m Conrell. 1994,. , ^ELZZSZ 



■ vanou* search parameters mciudmg MW of prote.n. mass accuracv of peJ^S 
number of missed enzyme cleavages allowed (Henzd ctaL my. M on2 „„/ " ,904. 
Rasmus™ e; «/.. 1 994 ». The correct prote.n identny is the prote.n « h.ch ha. the mo. 
pent.de mas.es .n common with the unknown .sample. Idem*** have been established 
« nh a, feu as three peptides, but unambiguous identificaucn is ihoueh, to rZ 2 a 
mass spectrometnc map covering most peptides of the protein ,Monz „ „/ ,gg . 
^ates era, 1993.. To date, pepude mass fingerprinurt of proteU t bt-n 
undertaken from the human myocardial prote,n and keratinoevte 'man from an£ .1, 
2-D gel. and from reference maps of S pi m plasnw nellnrn,,,, ^ A v,"t , 
^malumtSuaonetaL J995;Ra.smussen«a/.. 1994:Henzelr/«/.. 199VQrdv^ 
e, a, 199:, * asmger aL 1995,. although the technique is most powerfu 

I 994 '3^/" 995 n r er Pr0lCin idCmif,Ca,i0n leChn,qUt ,Ra ~ - <"-• 



M ASS SPECTROMETRY SEQUENCE TAGGING 

An extension of peptide mass fingerprinting has recentlv been described called 
pept.de sequence tagpng (Mann and WUra. j 9 94: Mann.'l995,. TH.* us C s "tandem 
mass spectrometry (MS/MS » to initially determine the mas, of peptides, then suhicc 
hem to fragmentation by coUiaon with a gas. and finally determ.ne the mas; of 
fragments.The resulting spectra g,ves information about a pence's amino "acid 
sequence. The fragmentation masses of peptides can rarely be used to assi cn a complete 
sequence, but ,t usually allows a short sequence tag" of 2 or 3 amino acids ,0 be 
determined. This sequence tag and the original peptide mass is matched bv computer 

agams,adatabase.provid.ngaIikelyidem,,yofthepep,ideandthepro,ein,t^ 
The major drawback for this technique as a mass screening tool , s ,hc eomplcxiiv of the 
mass data generated and the h.gh level of expen.se required for „s 1 , 

Nevertheless. „ represents a useful new protem identification method u-hiTh .-re, 
increases the power of pept.de mass fingerprinting proton identification. *' ' ' 

Cross-species protein identification 

Prote.n sequence databases continue to grow at a rap.d rate, xet n ,s no. w.dclv 
appreciated that close ,o90* of all information contained ,n current proton darings 
comes from onh .0 species (A. Ba.roch. Pers. Comm. , Fonunateh „„. , n fo ^ n 
can be used ,0 study proteomes of organisms that arc poorK denned a, ^ 2 

7 C 00- [I C,CC,r0ph rr ^ Pmtein identification ,Cord • 

uL 1 99;. Wasmgcr a «/.. 1 995 ,. Th,s approach allows prote.ns from reference man 
01 many different speces ,0 be identified u tthout the need for the corresponding -encs 
.0 be cloned and sequenced. This is particularly true for Wekccpmc protcnV such 

wh,ch Z> a?e e b"r m ? T ,y ^ ° NA ™* ion ™* P-eln manufacture 
uh,ch are highly conserved across speces boundaries. Proteins that cannot £ 

.denufied across speces boundaries can then become the focus 0 f fun her protem 
characterisation and DNA sequencing efforts. P 
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Rapid cross-species identification of pmein* 'rom 2-D reference map* can he 
undertaken with amino acid composition or peptide mas* fingerprinting methoA 
tFifure 6u but these -echniques alone ma> not identify protein* unambiguously when 
phylogeneti: cros-s.pc.ies distances are r eat or analysis data i< of poor quality ( y w * 
et aL 1995: Shaw. 1993: Cordwell et aL 1995i. However, very high confidence in 
protein identities can be achieved when lists of besi-matchinc proteins Generated* bv 
both techniques are compared (Cordwell et al.. 1995: Wasinger et «/* 19951. The 
correct identification i> found u-hsn the same protein is rankedhighly in list* of best 
matches generated by both techniques. This method has allowed approximately 120 
proiein.s..from.the.reierence map.of.the-mollicute-J/>iro/>/</.wii« /»c-//n>niw. represent- _ 
mg approximately one quarter of the proteome. to be confidently identified hv 
reference 10 protein information from other species t S. Cordwell. Personal Communi"- 
cation). When cross-species protein identification is to be undertaken, it should be 
noted that the molecular weight of a protein type across species is usually hichlv 
conserved, but that protein pi can vary by more than 2 units (Cordwell et aL 1995) 
Accurate molecular weight determination by direct mass spectrometry of proteins 
blotted to PVDF (Eckerskom et aL 1992) should therefore be a useful additional 
parameter for cross-species protein identification. 

CHARACTERISATION OF POST-TRAN'SLATIONAL MODIFICATION'S 

Many protein* are modified after translation. Such post-translational modifications 
including glycosylate, phosphorylation, and sulfation (see Table 6). are usually 
necessary for protein function or stability. Some abnormal modifications are associ- 
ated with disease (Duthel and Revol. 1993: Ghosh et aL 1993: Yamashita et al. 
1993). In proteome studies, post-translational modifications can be examined on all 
proteins present, or on individual spots. Studies on all proteins provide an indication 
of which protein* may carry a certain type of modification. For example. 2-D l-cI 
analysis of cell cultures grown in the presence of pH] mannose or p'Pl phosphate 
give v an indication of which proteins carry glycans containing mannose. and which 
proteins are phosphonlated (GarreKand Franza. 1989). Lceiin*bindin<: studies of 2-D 
gels blotted to PVDF or nitrocellulose provide information on the saccharides, if anv. 
that are earned by proteins present (Gravel et al.. 1994). ' " 

When individual proteins of interest earn ing post-translational modifications have 
been found, micropreparativ- 2-D electrophoresis can be used to purify them in 
microgram quantities (Hanash et al.. 1991: Bjellqvisi a al.. lyyjh). If protein 
isoforms of similar MW and pi are to be studied, focusing unh narrow ranee pi 
gradients (I pH unui can provide greater separation and resolution. After electro- 
phoresis, the type and degree of protein phosphorylation can he invested iMunhy 
and Iqbal. 1991: Gold et al.. 1994). monosaccharide composition can be determined 
• W-itzhandler et aL 1993: Packer et aL 1995). and the structure and exact site of 
glycoamino acids can be investigated b> either Edman degradation based techniques 
or by mass spectrometry ( Pisano et uL 1993: Hubeny et aL 1993: Carr. Huddleston 
and Bean. 1993). With further development of rapid techniques, investigation of 
phosphorylation and monosaccharides by chromatographic or mass specirometric 
means is likely to become a routine step in the characterisation of post-translational 
modifications of proteins from reference maps. 
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The stands of proteome projects 

Many technical aspects of proteome research have aireadv K-.„ w 

Advance .n proKome project will iniuallv reh on ™ ^ r*™" 1 

i** 10 - «■» acid -i-J^Kni « 

r*:h pro ;e ,n .pa. 7V,«, 7 show, «e„ome ^ .w^ ™" ™ '° * 10 
pro,ein< already defined for a n-mber of model o£*^-r£!" ? " Uml,er of 
? enor,e <eouenc,„ f programs fo, £. „/, and J IZSittSSTlT 

Mat of ome ortwr genomes ,and especially ,„e „' ™ ' "* 

co rJS , : «= « 0U5 „ r « unl L,v ,o^r« Krsr.s^ 

ihtt. -D .eference maps and proieome projects of single cell orcaniv-ns like 
plavm sp.. £ f0 /, and i cmvuiae will be the nun d^M.H ,r?.T ? f " 
Wasinger ,995; Vanbogelen „„/.. ,9^' '?* 

maps of othe, organ,™ will ,ai« longer ,o eonC l Ho^vif 1 ™ 7""'" 
species pnxein .demificauon .echniques will allowoLo™! » m ? """" 
and s,mple eukar^es ,o be panialls'defsned in 

rsirrsnw marc for mm- model < V »n.«ms. Gcnnm- $,« da» f««T 2 ! " fcte " ,n > " n : " D 

Species Name 
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The study of vertebrate proteomes and vertebrate develonm-n, ■ < a „h 
undenak.n, in comparison ,o the ,nvest, ? a„on of sm^ e c- 0 n ^ 
because v a « numbers of proteins are developmental* ernrl or *» ,M «»- Tm, ,s 
hundreds 0 f unique pro,e,ns. and ^ ^Z^^.T^^ 
elated thai at least 35* of proteins ,n venehr7/~n< , w " P Howev «- « ^ 
.o ussue. constituting the ' housekeepmfi prweinl ( B jrd ^ ShT^ ^ ^ 
proteins constituting a set that are specific ,o a eel , vne' ?! u "^'"^ ° f 
e.ectrophoret,c condition, are ^n^^^^^^^ 

J-eeleraies the defmnion of the 'housekeepinc ' prote.ns a , u e | | « 1 „r ^ 
^re unique to different t.ssue tvpes Such ^tud^ mlt ! ^ofprote.n.tha. 

Po,-«rans.at,ona, modificaiionT^ f c» ^^'n teT ^ h >' 
different tissues Protein whai re ma ,„ ,ni d, ; fer on ,he «me gene product in 
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FtTl'RE DIRECTION'S OF PROTON* httOJZCTS 
Thii rrvi-u ha> described rcceni advance* m ^ of nrnfMW% 

<«para.,on .demif.ea.ion and a™,™ of ■^of^ST^'t ^ 

powble me e 5 .abl,shr»e„, of deuiled reference ^ for o™" m ? ,t- h 
hecom.nc .he meAod of cho.ee for .he defin.™, 0 fX sut< o7«"o ^1, t j IT 

investigation of gene expression therein unJ thf 

Proteome project, are already unpack? on the dosrma of molecular hiol«n ,1, 
D.N A sequence constitutes the definition 01 an oreani.m For exam*., ,k S> 
of different tissues of a single organism are ofte^^^Z^ T' 0 "^ 
cross-species identification of protems ttorJ^tl^S^^^ 
from Candida albicans by comparison with 5. „LSr «n o«n UB T'^ 
organisms that are poorly molecularly defined. As mS^^ r ^ ° n 
proceed at a pace orders of maenitude faster than ,demir «caiion can 

defining the gene and protetn coJ^T^Z 
«ng of genomes wil, he avoided, and ^^^XJ!^ ^ 

Just as genome sequencing is not an end in itself, neither is an annotated - D nm. 
reierence map of an organism, nor indeed the identification of prZ^L'^ZlT 
So whilst an immediate aim of proteome projects is to screen pro eins in Tfl 

maps. in,, will lead to expression studies and characterisation ^Z^Z] Bar 
modifications. The challenge that then needs to be addressed is the in 1 
structure and function of proteins ,n a proteome. The JS rfl:^ 
the fact that over half the open readins frames identified ;„ c '""Mraied by 

III u W inniallv of no known function ,Ol~ £ I^S^^ 

rroteome protect are hscomin* but will l^d tn ,„ " • " e nou and 
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